0. Introduction to Psychic

This tutorial will cover some basic usage of the Psychic toolbox. This contain a wealth of useful EEG processing functions that were developed during the writing of the PhD thesises of Boris Reuderink and Marijn van Vliet.

The Magic Trick

In this tutorial we will do some simple EEG data analysis in order to 'read' a subjects mind. This experiment is playfully called the "magic trick". The subject was sitting in front of a screen and was presented with 9 playing cards:

1: 'Ace of spades'
2: 'Jack of clubs'
3: 'Queen of hearts'
4: 'King of diamonds'
5: '10 of spaces'
6: '3 of clubs'
7: '10 of hearts'
8: '3 of diamonds'
9: 'King of spades'

He picked one of these cards and kept it in his mind. Next, the 9 playing cards would flash one-by-one in a random order across the screen. Each card was presented a total of 30 times. The subject would mentally count the number of times his card would appear on the screen (which was 30 if he was paying attention, we are not interested in the answer he got, it just helps keep the subject focused on the cards).

In this tutorial we will analyse the average response to each card. The card that the subject had in mind should produce a larger response than the others.

First off, let's download the data. At the moment, it is stored in a public dropbox folder. Execute the code below by pressing ctrl+enter. It will take some time to run, depending on the speed of your internet connection.

NOTE: The data used in this tutorial is EEG data that has been bandpass filtered with a 3rd order Butterworth filter with a passband of 0.5-30 Hz. This results in relatively clean looking data. When doing ERP analysis on other data, you will have to filter it yourself. Don't do ERP analysis on non-filtered, non-baselined data!



In [13]:

    
import urllib
import scipy.io

# Download the data from dropbox
urllib.urlretrieve('https://dl.dropboxusercontent.com/u/79303435/tutorial1-01.mat?dl=1', 'tutorial1-01-filtered.mat');

# Load it using the scipy module
m = scipy.io.loadmat('tutorial1-01-filtered.mat')
EEG = m['EEG']
labels = m['labels'].astype(int)

# Print some information about the data
print 'EEG dimensions:', EEG.shape
print 'Label dimensions:', labels.shape









    



EEG dimensions: (7, 288349)
Label dimensions: (1, 288349)

Introducing the `DataSet` class

The most important class used in Psychic is the DataSet class. It is the datastructure in which we will store the EEG data and any relevant metadata, such as channel names, classes, etc.

Take a look at the documentation first:



In [14]:

    
import psychic
psychic.DataSet?

The psychic.DataSet data structure consists of a list of instances. What constitutes a single instance of data is up to you as the user. In the case of raw EEG data, it makes sense to treat each sample as one instance.

The psychic.DataSet data structure has only one required field: data, which, as you can guess, contains the actual n-dimensional data as a numpy array. Iterating over the last dimension of this array should iterate over the instances. In this case, it will contain the EEG signal recorded at each electrode.

Other non-required, but very useful fields are:

labels contains the class labels for each instance. There are several ways to label instances. In this case, we label each EEG sample with an integer value given to us by the EEG trigger cable.
ids contains a unique identifier for each instance. In this case we set it to the time in seconds at which each sample was recorded.
feat_lab can be used to assign feature labels. Where ids gives labels to the instances, feat_lab gives labels to the other dimensions of data. In this case we can use it to set the channels names, so we can index by channel later and make plots with the correct channel names.



In [34]:

    
data = EEG
labels = labels
ids = np.arange(EEG.shape[1]) / 2048. # Data was recorded at 2048 Hz.
feat_lab = ['Fz', 'Cz', 'Pz', 'CP1', 'CP2', 'C3', 'C4']

d = psychic.DataSet(data, labels, ids, feat_lab=feat_lab)
print d









    



DataSet with 288349 instances, 7 features [7], 10 classes: [288079, 30, 30, 30, 30, 30, 30, 30, 30, 30], extras: []

The trigger cable has send out 10 different values. If you know how the trigger cable works, you know that it sends out zeros when nothing interesting is happening (so most samples are labeled with 0). The other values were emitted to mark the onsets of the presentation of the nine playing cards.

Below are some other useful properties of the DataSet class:



In [16]:

    
print 'The dataset contains', d.ninstances, 'EEG samples (instances).'
print 'The dataset contains', d.nfeatures, 'channels (features).'
print 'The names of the features are:', d.feat_lab[0]

# Psychic contains a function to deduce the sample rate of the recording using the DataSet.ids member:
print 'The sample rate of the recording is:', psychic.get_samplerate(d), 'Hz.'









    



The dataset contains 288349 EEG samples (instances).
The dataset contains 7 channels (features).
The names of the features are: ['Fz', 'Cz', 'Pz', 'CP1', 'CP2', 'C3', 'C4']
The sample rate of the recording is: 2048.0 Hz.

Now that we have everything there is to know about this data in one datastructure, we can use Psychic to display it:



In [17]:

    
psychic.plot_eeg(d)









    Out[17]:

The running EEG is plotted horizontally with the channels stacked on top of each other. A vertical line (color coded to correspond to each of the 9 classes) is drawn whenever an event occured. The plot can be tweaked to your liking by passing more parameters to the psychic.plot_eeg function, see the documentation (psychic.plot_eeg?) for that.

Selecting data

Drawing the entire dataset at once makes for a very cluttered plot. Plotting only the first 10 seconds should be more managable. The psychic.DataSet class implements the __getitem__ function to select instances, meaning that we can use Python's [ ] syntax to select them. Currently, each EEG sample is one instance. Since the data is recorded at 2048 Hz, 10 seconds worth of data is 20480 samples:



In [18]:

    
psychic.plot_eeg(d[:20480]) # Using Python's slice syntax









    Out[18]:

When we want to select different features as opposed to instances (in our case select different channels as opposed to samples), the psychic.DataSet class provides a member variable DataSet.ix that implements the __getitem__ method that treats the dataset as a n-dimensional array: (features x instances):



In [42]:

    
psychic.plot_eeg(d.ix[:2,:20480])









    Out[42]:

Similarly, it has a member called DataSet.lix, which does the same, but performs a lookup on the feat_lab and I member variables first, so we can select data like this:



In [19]:

    
# Select channels using labels, select first 10 seconds
psychic.plot_eeg(d.lix[ ['Cz', 'Fz'], :10 ])









    Out[19]:

Note that DataSet.lix is clever with floats. Say the timestamps of the EEG samples are:

[9.8293, 9.9205, 10.083]

and we request the range [:10]. Instead of trying to find the index of the exact value 10 (which doesn't exist and would generate an error) it will interpret the command as a request for everything 'until but not including 10' and select the first two items, which are both less than 10.

Slicing up the data into trials

We are interested in the response generated whenever a card was shown, so we cut one-second-long pieces of EEG signal that start from the moment a card was shown. These pieces are called 'trials'. The Psychic module provides a node called psychic.nodes.Slice to do this.

Most of the functionality in Psychic comes in the form of nodes, modeled after how scikit-learn does things. We first instantiate our tool with the proper parameters, then fit the tool to our data with the (train()), and finally apply it (apply()).

The parameters to the Slice node are:

a dictionary that maps event codes to class names
a tuple (begin, end) that marks the beginning and the end of the slice in seconds, relative to the event onset

After slicing, you generally want to perform baseline correction on the pre-stimulus interval, which is provided by the psychic.nodes.Baseline node.



In [44]:

    
# Dictionary mapping the event codes to class labels
mdict = {
    1: 'Ace of spades',
    2: 'Jack of clubs',
    3: 'Queen of hearts',
    4: 'King of diamonds',
    5: '10 of spaces',
    6: '3 of clubs',
    7: '10 of hearts',
    8: '3 of diamonds',
    9: 'King of spades',
}

# Slicing up the dataset
slicer = psychic.nodes.Slice(mdict, (-0.1, 1.0))
slicer.train(d)
trials = slicer.apply(d)

# Performing baseline correction on the (-0.1 to 0) interval
baseliner = psychic.nodes.Baseline((-0.1, 0))

# There is also a shortcut to train and apply on the same data:
trials = baseliner.train_apply(trials)

# Print a summary of the resulting dataset
print trials









    



DataSet with 270 instances, 15764 features [7x2252], 9 classes: [30, 30, 30, 30, 30, 30, 30, 30, 30], extras: []

The data is now organized a little differently in the dataset and has become more complicated, since we now have to deal with higher dimensional data (channels x samples x trials) and class labels.

Each instance in the dataset is no longer a single EEG sample, but a trial:



In [38]:

    
print 'There are', trials.ninstances, 'trials.'









    



There are 270 trials.

Each trial has (channels x samples) = (7 x 2252) = 15764 features. The trials.data is now 3-dimensional:



In [39]:

    
print 'The shape of trials.data is (channels x samples x trials):', trials.data.shape
print 'The shape of the features is (channels x samples):', trials.feat_shape









    



The shape of trials.data is (channels x samples x trials): (7, 2252, 270)
The shape of the features is (channels x samples): (7, 2252)

The feature labels are now a list of lists: a list of channel names and a list of timestamps for the EEG samples in each trial:



In [40]:

    
print 'The channel names are:', trials.feat_lab[0]
print 'The EEG timestamps are:', trials.feat_lab[1][:10] # limiting output to 10 of the 2252 timestamps









    



The channel names are: ['Fz', 'Cz', 'Pz', 'CP1', 'CP2', 'C3', 'C4']
The EEG timestamps are: [-0.099609375, -0.09912109375, -0.0986328125, -0.09814453125, -0.09765625, -0.09716796875, -0.0966796875, -0.09619140625, -0.095703125, -0.09521484375]

trials.ids contains timestamps for the beginning of each trial:



In [41]:

    
print 'Beginning of each trial:', trials.ids[0,:10] # limiting output to 10 of the 270 timestamps









    



Beginning of each trial: [ 3.80322266  4.29199219  4.79199219  5.29199219  5.79199219  6.29199219
  6.79199219  7.29199219  7.79199219  8.29199219]

The labels trials.labels are now a (classes x instances) matrix of boolean values. Each column corresponds to an instance, the rows correspond to each class, containing True if the instance belongs to this class, otherwise False. An instance can belong to multiple classes:



In [25]:

    
print 'The shape of trials.labels is:', trials.labels.shape
print 'Here is a small portion of it:'
print trials.labels[:,:10]









    



The shape of trials.labels is: (9, 270)
Here is a small portion of it:
[[False False  True False False False False False False False]
 [False False False False False False  True False False False]
 [False  True False False False False False False False False]
 [False False False False False  True False False False False]
 [False False False False  True False False False False False]
 [False False False False False False False  True False False]
 [False False False False False False False False  True False]
 [False False False  True False False False False False  True]
 [ True False False False False False False False False False]]

trials.cl_lab contains the class labels belonging to each row in trials.labels. trials.ninstances_per_class contains the number of trials belonging to each class:



In [42]:

    
print 'The names of the cards that were shown are:', trials.cl_lab
print 'Number of times each card was shown:', trials.ninstances_per_class









    



The names of the cards that were shown are: ['10 of hearts', '10 of spaces', '3 of clubs', '3 of diamonds', 'Ace of spades', 'Jack of clubs', 'King of diamonds', 'King of spades', 'Queen of hearts']
Number of times each card was shown: [30, 30, 30, 30, 30, 30, 30, 30, 30]

Plotting the ERP

The powerful visualization function psychic.plot_erp will plot the ERP. The vspace parameter will control the vertical spacing of the channels. Of omitted a value will be chosen to ensure no overlap, but it is generally something you want to adjust yourself:



In [43]:

    
psychic.plot_erp(trials, vspace=20)









    Out[43]:

If you are plotting interactively, you can drag around the legend with the mouse in order to get it out of the way.

By default, only 8 colors are defined. Because there are 9 classes in this dataset, the function needs to reuse the first color (blue) for the last class. Now both the ERPs for the 10 of hearts and the Queen of hearts have the same color. As you can see from the documentation (psychic.plot_erp?), we can use the colors parameter to specify colors for each class:



In [47]:

    
# Lets give each response a different color
colors = ['k', 'b', 'g', 'y', 'm', 'r', 'c', '#ffff00', '#aaaaaa']
psychic.plot_erp(trials, vspace=25, colors=colors)









    Out[47]: